我们提出了一个用于动态培训互动的多层强化学习软件包,如循环,适应性和临时培训。我们的软件包围绕灵活的代理对象设计,可以轻松配置为支持不同的培训交互,并用混合奖励和n代理处理完全一般的多级环境。我们的包装基于StablyBaseLines3,我们的包装直接与现有强大的Deep RL算法一起使用。最后,Pantheonrl附带直观但功能的Web用户界面,用于配置实验并启动多个异步作业。我们的包裹可以在https://github.com/stanford-iliad/pantheonrl找到。
translated by 谷歌翻译
奖励学习是人机互动中的一个基本问题,使机器人与他们的人类用户想要的对齐方式。已经提出了许多基于偏好的学习算法和主动查询技术作为解决此问题的解决方案。在本文中,我们展示了一种用于基于活跃的偏好的奖励学习算法的库,使研究人员和从业者能够尝试现有技术,并轻松开发自己的各种模块的自己的算法。APREL可在HTTPS://github.com/stanford-iliad/aprel提供。
translated by 谷歌翻译
Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io.
translated by 谷歌翻译
For policymakers wishing to make evidence-based decisions, one of the challenges is how to combine the relevant information and evidence in a coherent and defensible manner in order to formulate and evaluate candidate policies. Policymakers often need to rely on experts with disparate fields of expertise when making policy choices in complex, multi-faceted, dynamic environments such as those dealing with ecosystem services. The pressures affecting the survival and pollination capabilities of honey bees (Apis mellifera), wild bees and other pollinators is well-documented, but incomplete. In order to estimate the potential effectiveness of various candidate policies to support pollination services, there is an urgent need to quantify the effect of various combinations of variables on the pollination ecosystem service, utilising available information, models and expert judgement. In this paper, we present a new application of the integrating decision support system methodology for combining inputs from multiple panels of experts to evaluate policies to support an abundant pollinator population.
translated by 谷歌翻译
Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by $2.3\%$ ID and $2.7\%$ OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of $4.2\%$ OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than $1\%$ both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to $4.6\%$ over standard finetuning and $4.4\%$ over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP.
translated by 谷歌翻译
Agriculture is at the heart of the solution to achieve sustainability in feeding the world population, but advancing our understanding on how agricultural output responds to climatic variability is still needed. Precision Agriculture (PA), which is a management strategy that uses technology such as remote sensing, Geographical Information System (GIS), and machine learning for decision making in the field, has emerged as a promising approach to enhance crop production, increase yield, and reduce water and nutrient losses and environmental impacts. In this context, multiple models to predict agricultural phenotypes, such as crop yield, from genomics (G), environment (E), weather and soil, and field management practices (M) have been developed. These models have traditionally been based on mechanistic or statistical approaches. However, AI approaches are intrinsically well-suited to model complex interactions and have more recently been developed, outperforming classical methods. Here, we present a Natural Language Processing (NLP)-based neural network architecture to process the G, E and M inputs and their interactions. We show that by modeling DNA as natural language, our approach performs better than previous approaches when tested for new environments and similarly to other approaches for unseen seed varieties.
translated by 谷歌翻译
Chain event graphs are a family of probabilistic graphical models that generalise Bayesian networks and have been successfully applied to a wide range of domains. Unlike Bayesian networks, these models can encode context-specific conditional independencies as well as asymmetric developments within the evolution of a process. More recently, new model classes belonging to the chain event graph family have been developed for modelling time-to-event data to study the temporal dynamics of a process. However, existing model selection algorithms for chain event graphs and its variants rely on all parameters having conjugate priors. This is unrealistic for many real-world applications. In this paper, we propose a mixture modelling approach to model selection in chain event graphs that does not rely on conjugacy. Moreover, we also show that this methodology is more amenable to being robustly scaled than the existing model selection algorithms used for this family. We demonstrate our techniques on simulated datasets.
translated by 谷歌翻译
由于人口统计因素(例如年龄,性别,种族等)的影响,已经在自动化的面部识别系统中进行了广泛的研究。但是,\ textIt {数字修改}的人口统计学和面部属性对面部识别的影响相对较小。在这项工作中,我们研究了通过生成对抗网络(GAN)引起的属性操作的影响对面部识别性能。我们通过使用Attgan和Stgan有意修改13个属性,并评估它们对两种基于深度学习的面部验证方法,Arcface和VGGFACE的影响,在Celeba数据集上进行实验。我们的发现表明,涉及眼镜和性线索的数字变化的一些属性操纵可能会大大损害面部识别多达73%,需要进一步分析。
translated by 谷歌翻译
基于方面的情感分析(ABSA)涉及审查句子对给定方面的情感极性的识别。 RNN,LSTM和GRU等深度学习顺序模型是推断情感极性的当前最新方法。这些方法可以很好地捕获评论句子的单词之间的上下文关系。但是,这些方法在捕获长期依赖性方面微不足道。注意机制仅专注于句子的最关键部分,从而发挥着重要作用。在ABSA的情况下,方面位置起着至关重要的作用。在确定对该方面的情绪的同时,近乎方面的单词会做出更多的贡献。因此,我们提出了一种使用依赖解析树捕获基于位置信息的方法,并有助于注意机制。使用这种类型的位置信息通过简单的基于单词距离的位置增强了深度学习模型的性能。我们对Semeval'14数据集进行了实验,以证明基于ABSA的基于ABS的依赖关系的效果。
translated by 谷歌翻译
感官反应系统(例如机器人技术和AR/VR)必须采取高度响应的实时操作,这是由涉及感应,感知,计划和反应任务的复杂决策驱动的。这些任务必须安排在资源约束的设备上,以便满足应用程序的性能目标和要求。这是一个困难的调度问题,需要处理多个调度维度以及资源使用和可用性的变化。实际上,系统设计师手动调整其特定硬件和应用参数,从而导致泛化不良并增加了开发负担。在这项工作中,我们强调了在有感觉反应系统中在运行时安排CPU资源的新兴需求。我们研究三个规范应用程序(面部跟踪,机器人导航和VR),以首先了解此类系统的关键调度要求。凭借这种理解,我们开发了一个调度框架Catan,该框架动态调度了在应用程序的不同组件上计算资源,以满足指定的应用程序要求。通过在广泛使用的机器人技术框架(ROS)和开源AR/VR平台上实施的原型实验,我们显示了系统计划对达到三个应用程序的性能目标的影响,Catan能够更好地取得更好的成就应用程序性能比手工调整的配置以及如何动态适应运行时变化。
translated by 谷歌翻译